Releases · Mozilla-Ocho/llamafile

28 Dec 10:41

jart

0.4.1

f6ea6bf

llamafile v0.4.1

llamafile lets you distribute and run LLMs with a single file

If you had trouble generating filenames following the "bash one-liners"
blog post using the latest release, then please try again.

0984ed8 Fix regression with --grammar flag

Crashes on older Intel / AMD systems should be fixed:

3490afa Fix SIGILL on older Intel/AMD CPUs w/o F16C

The OpenAI API compatible endpoint has been improved.

9e4bf29 Fix OpenAI server sampling w.r.t. temp and seed

This release improves the documentation.

5c7ff6e Improve llamafile manual
658b18a Add WSL CUDA to GPU section (#105)
586b408 Update README.md so links and curl commands work (#136)
a56ffd4 Update README to clarify Darwin kernel versioning
47d8a8f Fix README changing SSE3 to SSSE3
4da8e2e Fix README examples for certain UNIX shells
faa7430 Change README to list Mixtral Q5 (instead of Q3)
6b0b64f Fix CLI README examples

We're making strides to automating our testing process.

dadd5a7 Add CI (#126)

Some other improvements:

9e972b2 Improve README examples
9de5686 Support bos token in llava-cli
3d81e22 Set logger callback for Apple Metal
9579b73 Make it easier to override CPPFLAGS

Our .llamafiles on Hugging Face have been updated to incorporate these
new release binaries. You can redownload here:

Known Issues

LLaVA image processing using the builtin tinyBLAS library may go slow on Windows.
Here's the workaround for using the faster NVIDIA cuBLAS library instead.

Delete the .llamafile directory in your home directory.
Install CUDA
Install MSVC
Open the "x64 MSVC command prompt" from Start
Run llamafile there for the first invocation.

There's a YouTube video tutorial on doing this here: https://youtu.be/d1Fnfvat6nM?si=W6Y0miZ9zVBHySFj

Assets 6

14 Dec 09:23

jart

0.4

188f7fc

llamafile v0.4

llamafile lets you distribute and run LLMs with a single file

This release features Mixtral support. Support has been added for Qwen
models too. The --chatml, --samplers, and other flags are added.

820d42d Synchronize with llama.cpp upstream

GPU now works out of the box on Windows. You still need to pass the
-ngl 35 flag, but you're no longer required to install CUDA/MSVC.

a7de00b Make tinyBLAS go 95% as fast as cuBLAS for token generation (#97)
9d85a72 Improve GEMM performance by nearly 2x (#93)
72e1c72 Support CUDA without cuBLAS (#82)
2849b08 Make it possible for CUDA to extract prebuilt DSOs

Additional fixes and improvements:

c236a71 Improve markdown and syntax highlighting in server (#88)
69ec1e4 Update the llamafile manual
782c81c Add SD ops, kernels
93178c9 Polyfill $HOME on some Windows systems
fcc727a Write log to /dev/null when main.log fails to open
77cecbe Fix handling of characters that span multiple tokens when streaming

Our .llamafiles on Hugging Face have been updated to incorporate these
new release binaries. You can redownload here:

Assets 6

11 Dec 20:18

jart

0.3

1f17930

llamafile v0.3

llamafile lets you distribute and run LLMs with a single file

The llamafile-main and llamafile-llava-cli programs have been
unified into a single command named llamafile. Man pages now exist in
pdf, troff, and postscript format. There's much better support for shell
scripting, thanks to a new --silent-prompt flag. It's now possible to
shell script vision models like LLaVA using grammar constraints.

d4e2388 Add --version flag
baf216a Make ctrl-c work better
762ad79 Add make install build rule
7a3e557 Write man pages for all commands
c895a44 Remove stdout logging in llava-cli
6cb036c Make LLaVA more shell script friendly
28d3160 Introduce --silent-prompt flag to main
1cd334f Allow --grammar to be used on --image prompts

The OpenAI API in llamafile-server has been improved.

e8c92bc Make OpenAI API stop field optional (#36)
c1c8683 Avoid bind() conflicts on port 8080 w/ server
8cb9fd8 Recognize cache_prompt parameter in OpenAI API

Performance regressions have been fixed for Intel and AMD users.

73ee0b1 Add runtime dispatching for Q5 weights
36b103e Make Q2/Q3 weights go 2x faster on AMD64 AVX2 CPUs
b4dea04 Slightly speed up LLaVA runtime dispatch on Intel

The zipalign command is now feature complete.

76d47c0 Put finishing touches on zipalign tool
7b2fbcb Add support for replacing zip files to zipalign

Some additional improvements:

5f69bb9 Add SVG logo
cd0fae0 Make memory map loader go much faster on MacOS
c8cd8e1 Fix output path in llamafile-quantize
dd1e0cd Support attention_bias on LLaMA architecture
55467d9 Fix integer overflow during quantization
ff1b437 Have makefile download cosmocc automatically
a7cc180 Update grammar-parser.cpp (#48)
61944b5 Disable pledge on systems with GPUs
ccc377e Log cuda build command to stderr

Our .llamafiles on Hugging Face have been updated to incorporate these new release binaries. You can redownload here:

If you have a slower Internet connection and don't want to re-download, then you don't have to! Instructions are here:

#24 (comment)

Assets 6

01 Dec 18:51

jart

0.2.1

57cc1f4

llamafile v0.2.1

llamafile lets you distribute and run LLMs with a single file. See our README file for documentation and to learn more.

Changes

95703b6 Fix support for old Intel CPUs
401dd08 Add OpenAI API compatibility to server
e5c2315 Make server open tab in browser on startup
865462f Cherry pick StableLM support from llama.cpp
8f21460 Introduce pledge() / seccomp security to llama.cpp
711344b Fix server so it doesn't consume 100% cpu when idle
12f4319 Add single-client multi-prompt support to server
c64989a Add --log-disable flag to server
90fa20f Fix typical sampling (#4261)
e574488 reserve space in decode_utf8
481b6a5 Look for GGML DSO before looking for NVCC
41f243e Check for i/o errors in httplib read_file()
ed87fdb Fix uninitialized variables in server
c5d35b0 Avoid CUDA assertion error with some models
c373b5d Fix LLaVA regression for square images
176e54f Fix server crash when prompt exceeds context size

Example Llamafiles

Our .llamafiles on Hugging Face have been updated to incorporate these new release binaries. You can redownload here:

If you have a slower Internet connection and don't want to re-download, then you don't have to! Instructions are here:

#24 (comment)

Assets 9

01 Dec 15:32

jart

0.2

d05e063

llamafile v0.2

Warning: This release was rolled back due to a Windows breakage caused by jart/cosmopolitan@7b3d7ee. Please use llamafile v0.2.1.

Assets 9

20 Nov 20:02

jart

0.1

1b911af

llamafile v0.1

llamafile lets you distribute and run LLMs with a single file. This is our first release! See our README file for documentation.

Assets 8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Known Issues

Uh oh!

Uh oh!

Uh oh!

Changes

Example Llamafiles

Uh oh!

Uh oh!

Uh oh!

Releases: Mozilla-Ocho/llamafile

llamafile v0.4.1

Known Issues

Uh oh!

llamafile v0.4

Uh oh!

llamafile v0.3

Uh oh!

llamafile v0.2.1

Changes

Example Llamafiles

Uh oh!

llamafile v0.2

Uh oh!

llamafile v0.1

Uh oh!